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Abstract 

An information theoretic formulation of the distributed averaging problem previously studied in computer science 
and control is presented. We assume a network with m nodes each observing a WGN source. The nodes communicate 
and perform local processing with the goal of computing the average of the sources to within a prescribed mean 
squared error distortion. The network rate distortion function R* (D) for a 2-node network with correlated Gaussian 
sources is established. A general cutset lower bound on R* (D) is established and shown to be achievable to within 
a factor of 2 via a centralized protocol over a star network. A lower bound on the network rate distortion function 
\ for distributed weighted-sum protocols, which is larger in order than the cutset bound by a factor of log to is 

■ established. An upper bound on the network rate distortion function for gossip-base weighted-sum protocols, which 

I is only log log m larger in order than the lower bound for a complete graph network, is established. The results 

I ^ ■ suggest that using distributed protocols results in a factor of log m increase in order relative to centralized protocols. 

^ . 

Index Terms 

Lossy source coding, distributed averaging, gossip algorithms. 
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Distributed averaging is a popular example of the distributed consensus problem, which has been receiving much 
attention recently due to interest in applications ranging from distributed coordination of autonomous agents to 
distributed computation in sensor networks, ad-hoc networks, and peer-to-peer networks (e.g., see [l]-[7]). 

In this paper, we present a lossy source coding formulation of the distributed averaging problem. We assume that 
^ each node in the network observes a source Xi, i = 1,2, ... ,m, and the nodes communicate and perform local 
processing with the goal of computing the average S = (1/m) XlilLi -^i ^ within a prescribed mean squared error 
^ distortion. We investigate the network rate distortion function defined as the infimum of average per-node rates that 
Q\ achieve the desired distortion in general and for the class of distributed weighted-sum protocols, which include 
O gossip-based weighted-sum protocols. Our results, which are information-theoretic, shed light on the fundamental 
>- tradeoff in distributed computing between communication and computation accuracy, and the communication penalty 
k>( of using distributed rather than centralized protocols. 

The paper is organized as follows. In Section II, we define the lossy averaging problem and summarize our 
results. In Section III we briefly review recent work on distributed averaging and discuss how our approach differs 
from this previous work. In Section IV, we establish the network rate distortion function for a 2-node network. In 
Section V, we establish a lower bound on the number of communication rounds needed to achieve a prescribed 
distortion, and a cutset lower bound on the network rate distortion function. In Section VI, we investigate the 
lossy averaging problem for the class of distributed weighted-sum protocols. We establish a lower bound on the 
network rate distortion function for this class as well as an upper and lower bounds for gossip-based weighted-sum 
protocols. 

The paper will generally use the notation in [8]. 

II. Definitions and Summary of Results 

Consider a network with m sender-receiver nodes, where node i = 1,2,... ,m observes an i.i.d. source X^. 
The nodes communicate and perform local processing with the goal of computing the average of the sources 
S = {l/iTi)Y^'^^Xi at each node to within a prescribed per-letter distortion D. The following definitions apply 
to any set of correlated i.i.d. sources {Xi,X2, . . . ,Xm)- In Sections V and VI, we assume the sources to be 
independent white Gaussian noise (WGN) processes each with average power of one. 
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The topology of the network is specified by a connected graph with no self loops {M.,£)i where M. = 
{1,2,... ,ni} is the set of nodes and £ is a set of undirected edges (node pairs) {i,j}, i,j ^ M and i ^ j. 
Communication is performed in rounds and each round is divided into time slots. Each round may consist of a 
different number of time slots, and each time slot may consist of a different number of transmissions. One edge 
(node pair) is chosen in each round and only one node is allowed to transmit in each time slot. Although in general 
multiple edges may be chosen in the same round and multiple nodes may be allowed to transmit in a time slot, we 
restrict ourselves to one edge and one node at a time to simplify the analysis. Without loss of generality we assume 
that the selected node pair cormnunicate in a round robin manner with the first node communicating in odd time 
slots and the second node communicating in even time slots. Further, we assume a source coding setting, where 
communication is noiseless and instant, that is, every transmitted message is successfully received by the intended 
receiver in the same time slot it is transmitted in. As in most multiple-user information theoretic setups, we assume 
that each node has a sequence of n symbols of its source before cormnunication and computing cormnences. We 
seek to find the limit on the tradeoff between communication and distortion as n tends to infinity. 

Communication and local computing in the network are performed according to an agreed upon (T,p{e), R,n) 
averaging protocol that consists of: 

1) The number of cormnunication rounds T. 

2) A probability mass function p{e) for e G 6^, where Sf = {(ei, 62, . . . , ex) : et E £,t = 1,2, . . . ,T} is the 
set of all feasible edge selection sequences. 

3) A set of {e,R{e),n) block codes, one for each edge selection sequence e G Sf. Each (e,R{e),n) block 
code consists of: 

a) A set of encoding functions, one for each node in each round and each time slot. Suppose that in round 

t G [1 : T] = {1, 2, . . . , T}, edge et = {i,j} is selected, and this round consists of > 1 time slots. 
Without loss of generahty, assume that qt is even, node i transmits in odd time slots, and node j transmits 
in even time slots. In time slot u e {1,3, ... ,qt — 1}, node i sends an index Wtu{x2i,w^^^ ,Wl^^) G [1 : 
2"''*''], where Wir = {w1\ : i G Cr}, and thus W*]~^ is the collection of indices node i has up to time 
t-1. Similarly, node j sends an index wtuix'ji,w^-^^,Wj]^'^) G [1 : 2"''"'] in time slot z/ G {2, 4, . . . , qt}. 
The total transmission rate per source symbol of node i in round t is 

i^e{l,3,...,gt-l} 

and similarly for node j, the total transmission rate is rj{t) = "^^^^2 4 qt} '''tiy- 

b) A set of decoding functions, one for each node. At the end of round T, the decoder for node i G 

M assigns an estimate Sii(a^ii, VVa) = (sa(-a^fi! Wa), •5i2(-a'fi. Wa). • • • , ^^(a:^^ °f "^he average 

s" = {si, S2, ■ ■ ■ ,Sn) to each source sequence and all messages received by the node, where Sk = 

Let ri{t) = if node i is not selected in round t. Then the total transmission rate for node i e M is defined as 

T 

t=i 

The per-node transmission rate is defined as 

m ^ 

^-^ m 

1=1 

and the average per-letter distortion is defined as 

-. m n 



mn 

1=1 k=l 



where Sk = (1/m) YllLi -^ik and the expectation is taken over the source statistics. Finally, the expected per-node 
transmission rate is defined as 

R= 5^p(e)i?(e). 

ee£T 
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For a fixed T and p(e), a rate distortion pair {R.,D) is said to be achievable if there exists a sequence of 
(T, p(e) , R, n) code sets indexed by the block length n such that 

limsup p(e)D{e) < D. 

The network rate distortion function R*{D) is defined as the infimum over the number of rounds T and probability 
mass functions p{e) of all rates R such that the pairs {R, D) are achievable. 

Centralized versus Distributed Protocols: A goal of our work is to quantify the communication penalty of using 
distributed relative to centraUzed protocols. In a distributed protocol, such as the distributed weighted-sum protocol 
discussed in Section VI, the code used at each round is the same for all nodes, that is, it does not depend on the 
identities of the selected nodes. The code, however, can be time-varying, that is, can change with the round number. 
In a centralized protocol, on the other hand, the code can depend on the node identities. For example, a node may 
be designated as a "cluster-head" and treated differently by the centralized protocol than other nodes. 

As an example of distributed average protocols, we consider a class of distributed weighted-sum protocols. Before 
defining this class of protocols, we briefly review rate distortion theory for a WGN source and the mean squared 
error distortion measure [9]. In this setup, sender node 1 has a WGN source X with average power P and wishes 
to send a description X of the source to node 2 with normalized distortion d = D/P G [0, 1]. We assume standard 
definitions for a code, distortion, achievability, and rate distortion function. Then, the rate distortion function is 

R{D)= min^ I{X-X) = \\og- 

p{x\x)-:E{{x-xY)<Pd ^ " 

The test chaimel P{x\x) that achieves the minimum can be expressed as 

X = {l-d){X + Z), (1) 

where Z ~ AA(0, Pd/ (1 — d)) is independent of X. The rate distortion function is achieved by using a sequence of 
codebooks {x"(tt;) : w ^[1 : 2"^]}, where each estimate (description) is generated independently according 

to an i.i.d. A/^(0, P(l — d)) distribution, and joint typicality encoding [9]. We refer to such codes as Gaussian codes. 

We are now ready to define the class of distributed weighted-sum protocols. We assume that the sources 
Xi., X2, . . . , Xjn are independent WGN processes, each with average power one. A distributed weighted-sum 
protocol is characterized by {T,p{e), R,n,d), where T, p{e), R, and n are defined as before, and d G [0,1] 
is a normalized local distortion. Let Si{0) = Xi for i e M and fix T, d, and an edge selection sequence e G Sj^. 
Assuming edge is selected in round {t -\- 1), define the test chaimels 

Si{t) = {l-d){Si{t) + Zi{t)), 
Sj{t) = {l-d){Sj{t) + Zj{t)), 

where Zi{t) and Zj{t) are independent WGN sources with average powers E{Si{t)'^)d/{l — d) and E{Sj{t)'^)d/{l — 
d), respectively, and they are independent of {Xi,X2, ■ ■ ■ ,Xm) and {Zi{t) : I E er,T E [Q : t — 1]}. Then the 
expected distortion between Si{t) and the output of the test channel Si{t) is 



E (^{Si{t) - Siit))^) = E{Si{tf)d. 



Similarly, the expected distortion between Sj{t) and Sj{t) is E{Sj{ty)d. Now, define the updated sources 

and Si{t -Fl) = Si{t) for leM\ {i,j}. Note that since {5^(0) : i e M} and {Zi(t) : I £ et,t e [0 : T - 1]} are 
independent Gaussian and the test chaimels and update equations are Unear, Si{t) is Gaussian for every i e M and 
t G [1 : T]. These Gaussian test chaimels are then used to generate Gaussian codes that are revealed to all parties 
prior to network operation. 
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Now we describe the (e, -R(e), n) block codes for a distributed weighted-sum protocol. Initially, each node i & M. 
has an estimate s"^(0) = x'^-y of the true average s^. In each round, communication is performed independently in 
two time slots. Assume that edge is selected in round i + 1. In the first time slot node i uses its Gaussian 

codes to describe the source Si{t) to node j. In the second time slot, node j similarly describes Sj{t) to node i 
using its Gaussian codes. At the end of the second time slot, nodes i and j compute the updated estimates 

\ 7 (3) 

respectively, where s^i(t) and s"i(t) are the estimates (descriptions) of sfi(i) and Sj^(i). The estimate for node 
I & M. \ remains unchanged, that is, s"^(t + 1) = 

At the end of round T, node i <^ A4 sets its final estimate of the average as sf^ = s-\(T) if it is involved 
in at least one round of communication, otherwise it sets sf^ = (l/m)s"^(0). Thus for the degenerate distributed 
weighted-sum protocol with T = 0, node i e M. sets its final estimate to (l/m)s"j^(0). 

We define the weighted-sum network rate distortion function R^g{D) in the same manner as R*{D) except that 
the codes used are restricted to the above class. 

Remarks: 

1) The weights in the update equations (2) are chosen such that source Si{t) is a sum of a convex combination 
of (Xi, Jl2, . . . jXm) with coefficients independent of the normalized local distortion d and an independent 
noise. Note that as we let d approach to zero, the update equations for the distributed weighted-sum protocol 
reduce to those for the standard gossip algorithm [5]. 

2) The distributed weighted-sum protocol as defined above does not exploit the build up in correlation between 
the node estimates induced by communication and local computing. This correlation can be readily used to 
reduce the rate via Wyner-Ziv coding. However, we are not able to obtain general upper and lower bounds on 
the network rate distortion function with side information because the correlations between the estimates are 
time varying and depend on the particular edge selection sequence. We explore the rate reduction achieved 
by leveraging this correlation through simulations. 

We also consider gossip-based weighted-sum protocols {T, Q, R, n, d), where Q is an m x m stochastic matrix 
such that Qij = if ^ 6. In each round of a gossip-based weighted-sum protocol, a node i is selected 

uniformly at random from A4. Node i then selects a neighbor j G {j : {i,j} G £} with conditional probability 
Qij. Note that this edge selection process is the same as the asynchronous time model in [5]. After the edge (node 
pair) {i,j} is selected, the code for the weighted-sum protocols described above is used. Thus, a gossip-based 
weighted-sum protocol {T,Q,R,n,d) is a distributed weighted sum protocol {T,p{e), R,n,d) with 

jL TT ^l{«e[l:T]:et={i,j}}| 



p(-)=^ n Q[ 



for every e G Sf. We define the gossip-based network rate distortion function RQT^g{D) for the class of gossip- 
based weighted-sum protocols in a similar way as i?^g(D). Note that R*{D) < R^g{D) < i?Q^g(D). 
The following result will be used in the bounds on R*{D), R^g, and RQ^g{D). 

Lemma 1: Consider the distributed weighted-sum protocol defined above for some fixed T, d, and edge selection 
sequence e G Sf. Then the distortion E{{Si{t))'^)d between Si{t) and Si{t) is achievable if 



ri(t + l) > ^logi 



for every i G et+i and t G [0 : T — 1]. 



Proof: The lemma can be proved by the procedure in [8] for extending achievability of a lossy source coding 
problem for finite sources and distortion measures to Gaussian sources with mean squared error distortion. We first 
estabhsh achievabihty for finitely quantized versions [Si{t)] of Si{t) foiieM and i G [0 : T] and [Si{t)] of Si{t) 
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forieM. and t G [0 : T — 1]. We then use the covering lemma, the joint typicahty lemma, and the Markov lemma, 
to show that joint typicahty encoding succeeds with high probability if 

ri{t + l)>I{[Si{t)];[Si{t)]) 

for every i G Ct+i and t G [0 : T — 1]. When encoding succeeds, the distortion between Sfi{t) and its description 

S^i{t) is close to E{[Si{t)]'^)d for i G et+i and t G [0 : T — 1]. Finally, taking appropriate limits, it can be readily 
shown that the distortion E((5i(t))^)d is achievable for the Gaussian sources and descriptions if 

ri{t + l) > ^log^, 

for every i G e^+i and t G [0 : T — 1]. ■ 



A. Summary of Results 

Section IV: We establish R*{D) for a 2-node network with correlated Gaussian sources (Proposition 1). 
Section V: 

1) We establish a lower bound on the number of rounds needed to achieve distortion D < T > 2m — 3 
(Proposition 2). 

2) For independent WGN sources, we establish the following cutset lower bound on the network rate distortion 
function (Theorem 1) 

1 , (m — 1) 
R*{D)> -log- 



2 w?D ■ 
The bound is tight for m = 2. 
3) We show that a centralized protocol over a star network can achieve rates within a factor of 2 of the cutset 
bound for large m (Proposition 3). We establish a tighter cutset bound for this network and show that it 
becomes tight as I? — 0. 

Section VI: We investigate a class of distributed weighted-sum protocols, including gossip-based weighted-sum 
protocols for independent WGN sources. 

1) We establish the following lower bound on the network rate distortion function for distributed weighted-sum 
protocols (Theorem 2) 

RwsiD) >-( log^ ) f log^— 

^-2V VD + l/mJ\ 4mD 

This bound is larger than the cutset bound by a factor of log m in order. 

2) We establish the following bounds on the expected network rate distortion function for a class of gossip-based 
weighted-sum protocols (Theorem 3) 

R*Q^siD) = n((log^] flog ^ 



DJ \ ° mD 

RhwsiD) = O { , , flog ^) flog log ^ + log ■ 



m{l-\2) V Dj \ ^ ^ m?{l- \2)D ^ 

where A2 is the second largest eigenvalue of the expected averaging matrix [5]. The upper bound is shown 
to hold for general independent sources, i.e., not necessarily Gaussian. For distortion D = o(l/mlogm), the 
upper bound has the same order as the lower bound, while for distortion D = Q{\/m\ogm), it is larger in 
order than the lower bound by a factor of log log m. Since a centralized protocol can achieve the same order 
as the cutset bound, and the lower bound on distributed weighted-sum protocols is log m larger in order than 
the cutset bound and is achievable to within order log log m, the order log m factor represents the penalty of 
using distributed protocols relative to centralized protocols. 
3) We use simulations to explore the improvement in rate achieved by exploiting the correlation induced by 
communication and local computing. 
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III. Relationship to Previous Work 

Examples of work on distributed averaging under the synchronous model include [10], [11], where deterministic 
linear iterative protocols are used. Each node iteratively computes the weighted sum of the estimates of its neighbors 
and itself, that is, s(t + 1) = As(t), where ^ is a nonnegative matrix with nonzero entries Uij only if there is 
an edge between nodes i and j. The results in [10] show that when ^ is a doubly stochastic matrix, the network 
achieves consensus average as t — > oo. Furthermore, when the topology of entire network is known, the optimal A 
that achieves the fastest convergence can be computed via semidefinite programming. In [11], it is shown that if 
^ is a stochastic matrix, the states converge to a common weighted-sum of initial states. The weights correspond 
to the steady-state probabiUties of the Markov chain associated with the stochastic matrix. If the initial states are 
divided by the corresponding weights, the consensus average can be reached. 

Synchronous protocols cannot be used in networks with link failures or dynamic topologies. This has motivated 
the development of the gossip protocol, which was first introduced for computing the average at a sink node in a 
peer-to-peer network [2] and later applied to distributed averaging (e.g., [5]). In each communication round, a node 
and one of its neighbors are selected. The two nodes update their estimates by averaging their current estimates. 
Note that this process does not change the average of the states in the network. In [12], it is assumed that in 
each communication round, a node and its neighbor are selected uniformly at random. The results provide order 
bounds on convergence time that hold with high probability. In [5], nodes select neighbors to communicate with 
according to a doubly stochastic matrix. Bounds on convergence time that hold with high probability are obtained 
as a function of the second largest eigenvalue of the matrix. The node selection statistics that minimize the second 
largest eigenvalue can be found by a distributed subgradient method. Motivated by wireless networks and the 
Internet, the paper also investigates distributed averaging for networks modeled by geometric random graphs and 
by preferential connectivity. It is shown that convergence time under the preferential coimectivity does not depend 
on the size of the network. 

In [1 1], a variation of the gossip-type protocol is studied for a network with link failures and dynamic topologies. 
In each communication round, each node first broadcasts its current estimate to its neighbors. Node i then makes an 
offer to neighbor j if Sj{t) < Si{t) and Sj{t) is smaller than the states of other neighbors of node i. At the end of 
this round, each node accepts the offer from the node with largest estimate, and both nodes update their estimates 
with their averages. It is proved in [11] that this protocol converges under certain connectivity constraints. 

The aforementioned work involves the noiseless communication and computation of real numbers, which is 
unrealistic. The effect of quantization on distributed consensus has received attention only recently. In [13], 
quantization noise is modeled as additive white noise. It is shown that the expectation of the state vector converges 
to the average of the initial states, but the variance diverges. Further, it is shown that the mean square deviation of 
the state vector is bounded away from zero. The tradeoff between mean square deviation and convergence time is 
also investigated. Recognizing that the divergence of the consensus variance in [13] is due to the assumption that 
quantization noise is white, the work in [14] exploits the time and spatial correlation of the estimates of the nodes. 
The initial states are assumed to be random variables with zero mean and finite variance. A differential nested 
lattice encoding quantizer that combines predictive coding and Wyner-Ziv coding is used. At each round, node i 
updates its estimate with a weighted-sum of the estimates of its neighbors and itself and an additive quantization 
noise, hence the estimates Si{t) and Si(t + 1) are correlated. The time correlation is exploited by predictive coding 
to reduce quantization error. The update process also increases the spatial correlation between nodes. As such, 
the estimate of each node is used as side information to reconstruct descriptions from received quantized indices. 
It is shown that the mean squared error is bounded when the optimal lattice vector quantizer is used, and the 
transmission rate at round t approaches zero as t ^ oo. The tradeoff between the rate per node per round and the 
final mean squared error is also investigated through simulations. 

The work in [15] and [16] use a different approach to quantized consensus. Each node has an integer-valued initial 
estimate and the nodes wish to compute the quantized average consensus, which is reached when Si (t) G { [sj , \s] } 
for i = 1, 2, . . . , m, where s is the average of the initial estimates. It is shown that simply uniformly quantizing the 
estimates of the gossip-based protocol is not sufficient to achieve quantized average consensus and a gossip-based 
protocol is shown to achieve it. In [16] the integer-valued averaging problem with exact consensus is investigated. 
A probabilistic quantizer is added to the real-valued gossip-based protocol. The estimate Si{t) is quantized either 
to u = \si{t)] or I = [si(t)\ with probabiUties {u — Si{t)) and {si{t) — I), respectively. The results show that the 
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expectation of the common estimate is the average s. In [17], different update rules for achieving quantized consensus 
by using deterministic uniform quantizers and probabilistic quantizers are compared. In [18], [19], exchanging and 
storing quantized information is considered for the consensus problem. 

The first information theoretic work on distributed averaging is reported in [20]. The nodes communicate through 
channels with finite capacities. Each node is required to compute a function of the initial values to within a desired 
mean squared error distortion. Lower and upper bounds on the time to achieve the desired distortion are shown to 
be inversely proportional to the graph conductance. 

Our work is related most closely to the work on quantized averaging in [13], [14] and the information theoretic 
work in [20]. Compared to previous work on quantized averaging, our information-theoretic approach to the 
problem deals naturally and fundamentally with quantization and the results provide limits that hold independent 
of implementation details. Our results are difficult to compare with the results in these papers, however, because 
of basic differences in the models and assumptions. While the work in [20] is information theoretic, it deals with 
a different formulation than ours and the results are not comparable. Our formulation of the distributed averaging 
problem can be viewed also as a generalization of the CEO problem [21], [22], where in our setting every node 
wants to compute the average and the communication protocol is significantly more complex in that it allows for 
interactivity, relaying, and local computing, in addition to multiple access. 

IV. R*{D) FOR 2-NODE Network 

Consider a network with 2 nodes and a single edge, and assume correlated WGN sources {Xi , X2) with covariance 
matrix 

r Pi p^/F^ 

Each node wishes to compute the weighted sum g{Xi,X2) = aiXi + 02^2, for some constants ai and 02, to 
within mean squared error distortion D. 

For a 2-node network, there is only one round of communication with an arbitrary number of time slots. The 
interesting case is when distortion is small enough such that each node must transmit to the other node. The 
following proposition estabhshes the network rate distortion function for this regime. 

Proposition 1: The network rate distortion function for the 2-node network with correlated WGN sources is 



2 V D J 

for D < min{af (1 - p'^)Pi,al{l - p^)P2}. 

Proof: The converse follows by a cutset bound argument given in the Appendix. 
Achievability of the network rate distortion function follows by performing two independent Wyner-Ziv cod- 
ing [23] steps. In the first time slot, node 1 uses Wyner-Ziv coding to describe its source Xi to node 2 at rate 
Ri = (1/2) log(af (1 — p^)Pi/D). In the second time slot, node 2 uses the Wyner-Ziv coding to describe its source 
X2 to node 1 at rate R2 = (1/2) log(a|(l — p^)P2/D). At the end of the second time slot, nodes 1 and 2 compute 
the estimates 

5ii = aixn + 02X21 and g^i = aix^ + a2-2;2i) 

where x^^ and are the descriptions of and X21, respectively. The average per-letter distortion between the 
estimates and the objective weighted-sum is 

2 n 

lim — VJ]E((5(Xife,X2fe)-G,fe)2) 

1=1 k=l 

1 " 

= i™o E («2 E [{X2k - X2kf) + a? E ((Xife - X^kf) ) 
" °° k=l 
1 / o-D 



"272+^172 ) = ^■ 



2 V a; 
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The per-node transmission rate is 

R1 + R2 1(1 al(^-fi')Pl I 4(1- f^')P2 \ 1 ( aia2[l-fr)^/lW2 \ 
2 2V2^ D ^ 2 ^ D ) 2 ^\ D )' 

This completes the proof of the proposition. ■ 
Remarks: 

1) In [24], Kaspi investigated the interactive lossy source coding problem when the objective is for each source 
to obtain a description of the other source. Our problem is different and as such Kaspi's results do not readily 
apply. In [25], the interactive communication problem for asymptotically lossless computation is investigated. 
Again their results do not apply to our setting because they do not consider loss. 

2) In the above example, it is optimal for each node to simply compress its own source and send the compressed 
version to the other node, that is, only two time slots and no intermediate computing are necessary and 
sufficient. Based on the results in [24], [25], we do not expect these conclusions to hold in general for 
non-Gaussian sources, other distortion measures, and other functions. 

3) Finding the rate distortion function for a 3-node network even with Gaussian sources is difficult because (i) 
there can be several possible feasible edge selection sequences and it is not a priori clear which sequence 
yields the optimal per-node rate, (ii) the codes allow relaying in addition to interactive communication and 
local computing, and (iii) it is not known if Gaussian codes are optimal. Results on a 3-node problem are 
reported in [26]. 

V. Lower Bound on T and R*{D) 

In this section, we estabUsh a general lower bound on the minimum number of rounds T for any averaging 
protocol. We then establish a cutset bound on R*{D) for independent WGN sources each with average power one. 
Finally, we show that the cutset bound can be achieved within a factor of 2 via a centraUzed protocol over a star 
network. 

A. Lower bound on T 

We first estabUsh the following lower bound on the minimum number of rounds needed by any averaging protocol. 

Proposition 2: Every averaging protocol that achieves distortion D < must use at least T > 2m — 3 

rounds. 

Proof: Let £t be the set of edges selected in rounds 1,2,. . . ,t. Suppose that the graph {M.,£t) is not connected. 
Then, for each node i € A4, there exists a node such that there is no path between these two nodes. The 
estimate 6"^ of node i is independent of the source -'^j(j) 1 and its distortion is then lower bounded by 

fe=i fc=i \ ^ ^ / 

Thus, if distortion D < is to be achieved at the end of round t, the graph {M.,St) must be connected and 

t > m — 1. 

Now let t > m — 1 be the smallest round index such that the graph {M.,£t) is connected. If the number of 
rounds T < t + m — 2, then there exists at least one node i e M \ et, which is not selected after round t — 1. 
However, the graph {A4,£t_i) is not connected, and thus the estimate of node i must have distortion higher than 
Xjvr? . Then the distortion D < cannot be achieved. Therefore, the number of rounds to achieve distortion 

D < is lower bounded by 

T>t + m-2>2m-3. 

m 

Remark: The averaging protocol that achieves R*{D) does not necessarily have to use the smallest T. It may be 
possible to use less bits by using more rounds. We do not, however, have a specific example of such case. 
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B. Cutset Bound 

Consider the m-node distributed lossy averaging problem when the sources {Xi,X2, . . . , Xm) are independent 
WGN processes each with average power one. We estabhsh the following cutset lower bound on the network rate 
distortion function. 

Theorem 1: The network rate distortion function R*{D) = if I? > {m — l)/m^ and is lower bounded by 



2 \ m^D J 

Proof: With no communication, i.e., i? = 0, the best estimate of each node is the MMSE estimate Sik = 
E{Sk\X^i) = Xik/m for k e [1 : n] and i G M. Let C/jfe = (1/m) YljeM\{i} -^jk- Then the distortion in this case 
is 

m — 1 



l-EEE((5.-s...f) = -^EEE(c&) 



mn — ' — ' ' ' mn — ' — ' ' 

1=1 jk=i 1=1 fe=i 

Thus, if L» > (m - \)ln?, R*{D) = 0. 

Next we consider D < [m — l)/m^. Fix T and e G Sf and consider an (e, i?(e), n) block code that achieves 
distortion D{e). Since only pairwise communications are allowed, the number of bits transmitted by all nodes is 
equal to the number of bits received by all nodes. We consider the number of bits received by node i, denoted by 
nRi{e). Let Wi be the collection of indices sent from nodes j e M.\ {i} to node i. Then the estimate S^j of node 
z is a function of its source X^^ and the received message Wi. We can bound the receiving rate as follows 

nRi{e) > H{Wi) > H{Wi\X^^) > /([/Jl; W,|Xfi) 

n 

= E {h{U,k\Ut\xr,) - h{U,k\Ut\xr„W,)) 

k=l 

n 

= E {^(^^k) - h{u,k\u^,-\x^„w„sr,)) 

k=l 

n 



k=l 

n ^ ( m 
= — log I zire 



" / 1 
/ Uik-\ Xik - Sik 

k=l ^ 



Xik ) ^ik 



^ ^ k=l 



n , /m — 1 
-2^°H^ 

where Di = Ylk=i ^ (("^fe ~ Sik)"^). The per-node transmission rate is lower bounded by 

/ m 



R{e) > mm — > - log — tt— = - log 

~ (i/m) Er=i A<D(e) \rn^^2 ^\ m?Di ) ) 2 ^ 



\'m?D{e) 



which follows from Jensen's inequality and the distortion constraint {l/m) YliLi A < For ^ny probability 

mass function p{e) such that X^eeff ^'(^)^(^) — expected per-node transmission rate is lower bounded by 

p(e)«(e) > E f (e) (5 log j ^ 5 '"^^ 

Since T is arbitrary, the network rate distortion function is also lower bounded by 
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Remarks: 

1) As can be be readily verified from Proposition 1, the above lower bound is achieved for m = 2. 

2) In the following subsection, we show that a centralized protocol for a "star" network can achieve a rate less 
than twice the cutset bound for sufficiently large m and small D. 

3) In the above proof, we considered only m cuts. Can the bound be improved by considering more cuts? Based 
on our investigations, the answer appears to be negative. 

4) The above cutset lower bound can be readily extended to correlated WGN sources and weighted-sum 
computation. The resulting bound is tight for m = 2 as shown in the previous section. 

C. Upper Bound on R*{D) for Star Network 

Consider a star network (or any network that contains a star network as a subnetwork) with m nodes and m — 1 
edges £ = {{1, 2}, {1, 3}, . . . ,{l,m}}. For this network, we can establish the following upper bound on the 
network rate distortion function. 

Proposition 3: The network rate distortion function for the star network is upper bounded by 



m V m^D 
for D < {m — l)/m^. 

Proof: We use the following centralized protocol where node 1 is treated as a "cluster head." The centraUzed 

protocol has T = 2m — 3 rounds. The probability mass function p(e) = 1 for the edge selection sequence 
e = {{1, 2}, {1, 3}, . . . , {1, m}, {1, m — 1}, ... , {1, 2}}. There are two time slots in rounds m — 1 and one time 
slot in the rest of the rounds. We first define the test channels 

X, = {l-d){Xi + Z^) 

for i e M- \ {!}, where {Z2, Z^, . . . , Z^) are independent WGN sources, each Zi, i e M \ {1} has average power 
d/{l — d), and they are independent of (Xi, X2, . . . , Xm)- Define the sources 

= — Xi H Xj, 

j&M\{l} 

and 

Ui = -Xi + - V Xj 

jeM\{i,i} 

for i G \ {1}. Since Si and [7j for i G \ {1} are linear functions of WGN sources, they are also WGN 
sources. Now define the test channels 

a, = {I - di){u., + Zi) 

for i e M. \ {!}, where {Z2, Z^, . . . , Zm) are independent WGN sources, each Zi, i e M \ {1} has average power 
'E{Uf)d\/ (1 — d\), and they are independent of (Xi, X2, . . . , Xj^) and (Z2, Z3, . . . , Zm). Then [/j is Gaussian for 
i ^ M.\{\\. Define the sources 

_ 1 

Si = — Xi + Ui 
m 

for i G \ {!}, which are also Gaussian. We use these Gaussian test chaimels to generate Gaussian codes for Xi 
and Ui for every i ^ A4 \ {1}. These codes are revealed to all parties. 

From the above test channels, we can readily compute the expected distortion between the average S and the 
estimate Si for node 1 is 

E{{S-Sif) = e' ' ' 



m \ • 

-J2{Xi-Xi)] 

\f^E Ux,-Xif)=^d. 



m 

1=2 
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The source Ui has the following two properties 

2 




3&M\{i} J J \\ jeM\{i,i} ) I 

je>i\{i,i} 



and 



jG>1\{l,i} 

m m \ ■'J 





A 


1 






+ 


m? 




1 






+ 







for i G \ {1}. The distortion for node i G \ {1} is 



j^M\{i,i} 




E ((5 - Sif) = U,-Ui + Ui-^ 

\ 3&M\{i} 



1^ , r , 

ieA^\{i} 

/-, ,^ \ , m — 2 , 
„ (1 - d) di + 

The (e, R(e),n) block code is specified as follows. In round i G [1 : m — 1], node i = t + 1 uses the Gaussian 
codes for Xi to describe it to node 1. Node 1 then computes the estimates 





and 



«ii 

m m 

ieA4\{i} 



jeX\{i,i} 



for i G \ {!}, where x"^ is the description of x^^. In round t G [m — 1 : 2m — 3], node 1 uses the Gaussian 
code for the source V^m-t-x to describe it to node 2m — i — 1. At the end of round 2m — 3, node i ^ M. \ {1} 
computes the estimates 



*il — ^ «1 '1' 

where u"]^ is the description of iHl^ 

In Theorem 1, we already showed that distortion D > (m — l)/m^ is achievable using zero rate. Thus, we 
consider achievability for distortion D < (m — l)/m^. 

Using a slight variation on Lemma 1, we can show that the distortion E{Xf)d between Xi and X^ and the 
distortion E{Uf )di between Ui and Ui wee achievable if 

riii-l)>I{Xi;Xi) = Uog^, 

ri{2m-i-l)>I{Ui;Ui) = ^log^^ 



for ieM\ {!}, < d < 1 and < di < 1. 
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Thus the per-node transmission rate is 

1 /m — 1 



R 



m 



1 m — 1 1 
2 + 



m- 1 /I 
2m °^ 



and we obtain the upper bound of the network rate distortion function, 



R*{D) < inf 



■log 



2 " ° Vc^^i 

where the infimum is over all < d < 1 and < di < 1 satisfying 

-d+{m-l)( -^di + + 



< D 



Choose d = di = D / {2{m — l)'^) satisfying the above constraint, we obtain the upper bound 



R*{D) < 



m 



m 



1 /2fm 
■log 



Note that the above upper bound on R* (D) is not convex and therefore can be improved by time-sharing between 
the above centraUzed protocol and the degenerate zero-rate protocol. Note also that the ratio of the upper bound to 
the lower bound for D < as m — ^ cx) is less than or equal to 2. Thus a centralized protocol can achieve a 

rate within a factor of 2 of the cutset bound. 

Remark: Note that this centralized protocol uses the minimum number of rounds T = 2m — 3. This does not imply 
optimahty in terms of rate, however. 

Can the factor of 2 between the upper bound and the cutset bound be tightened? It turns out that the cutset 
bound can be improved for low distortion for trees in general, which is illustrated in the following. 

Proposition 4: The network rate distortion function when the network is a tree is lower bounded by 



R*{D) > 



m 



1 



2m 



log 



1 



2m3D2 



Proof: In a network with tree topology, removing an edge separates the network into two disconnected 
subnetworks. Fix T and the edge selection sequence e € Sj, and consider an (e, R{e),n) block code that achieves 
distortion D{e). The total transmission rate mR{e) is the number of bits flowing through each edge in both 
directions. Let Wij be the collection of indices sent from node i to node j and Rij{e) be the transmission rate for 
sending this message. Without loss of generality, assume that i = 1, j = m, and removing the edge {1, m} partitions 
the network {M,£) into {Mi,£i) and {M2,£2) such that Mi = {1,2,..., I}, M2 = {/ + 1, / + 2, . . . ,m}, and 
I > lo = \'m/2']. We first bound the number of bits flowing from node 1 to node m. 



I Xml) 



( 1 '» 

\ m ^-^ 

\ 1=1 

n / / la 



^/o+l,l'"^io+2,l' • 



ml 



fc=l 



lo 



^ lo ^ la - \ \ 

— "^Xik PFim, — ^Xff \X;'J_^i^i,X;j_^2,l>---!-^ml''S'mfe 1 j 
i=l i=l / / 



k=l 



m-^Dn 



13 



where = E((S'^ — Smk)^), the equaUty follows from the fact that the sources are independent 

WGN processes and Smk is a function of (1^1^, ^/^i i, ^/^2 i' • • • ' "'^mi)' the last inequality follows by 
Jensen's inequality. Next we bound the number of bits flowing from node m to node 1. Consider 

1 



> / 



m 



^115^215 ■ ■ ■ 5^m-l,l 



~ (^^ ~^(m^™^ ^ml,— -'^ml^^ll^^21>■■■^^m-l,l!'S'lfc^ j 

n 



k=l 



Xmk 

m 



where Di = (1/n) Ylk=i ^(('S^i ~ ^u-)'^), the equality follows from the fact that the source X^ is a WGN process 
and S'lfc is a function of (Wmi, X^^, X21, ■ ■ ■ ,Xl[), and the last inequality follows from Jensen's inequality. For 
any probability mass function p(e) such that Xleeff ^'(^)-^(^) — ^ where D{e) = {^/'m)Y^^\Di, the expected 
per-node transmission rate is lower bounded by 



E 



m — 1 



m 



p{e){Rim{e) + Rmiie)) > 



m 



2m 



log 



m^Dy - 2m °^\2m^D^J 



for D < l/m^, where the first inequality follows from Jensen's inequality. ■ 

Note that as -D — 0, the ratio of the upper bound in Proposition 3 to this lower bound approaches 1. The 
technique we use to tighten the cutset lower bound for this case, however, cannot be applied to networks with 
loops. 



VI. Distributed Weighted-Sum Protocols 

Again assume that the sources {Xi, X2, ■ ■ ■ , Xm) are independent WGN processes each with average power 
one. We consider distributed weighted-sum protocols (T,p(e),R,n,d) as defined in Section II. The weighted- 
sum network rate distortion function (^) difficult to estabhsh in general. In the following subsection, we 
establish a lower bound on R^g{D). In Subsection VI-B, we estabhsh upper and lower bounds on i?Q^g(D) for 
gossip-based weighted-sum protocols, which in turn estabhshes an upper bound on R^g{D). 



A. Lower Bound on R^^{D) 

We establish a lower bound on R'^^{D) that applies to any network. Consider a distributed weighted-sum 
protocol (T, p(e) , R, n, d) for a given network. Fix an edge selection sequence e and let Ur be the r-th time node 
i is selected and define 

Ti = {tii,ti2, . . ■,Ut,} = {t:ie et}, for z = 1,2, . . . ,m, 

where Tj = \Ti\ is the number of rounds node in which i is selected. Then the number of rounds T can be expressed 
as T = (1/2) X]S=i where the factor of 1/2 is due to the fact that two nodes are selected in each round. We 
shall need the following properties of the estimate Si{t) to prove the lower bound. 

Lemma 2: For any distributed weighted-sum protocol (T, p{e), R, n, d) and any edge selection sequence e G £j , 
the estimate of node i at the end of round t can be expressed as 

m 

Si{t) = J2 7y (*)5j(0) + Viit)^ (4) 
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where Vi{t) is Gaussian and independent of the sources (Xi,X2, . . . ,^m)- Furthermore, the diagonal coefficients 
satisfy the following property 

liiit) > ^ (5) 

for Ur <t < ti^r+i and r G [1 : Tj]. 

Proof: When t = 0, we have jn{0) = 1 and 7ij(0) = for z 7^ j. Suppose that jij{t) > for i,jeM. and 
(4) holds up to round t. In round t + l, assume that the node pair {i, 1} is selected. By the update equations in (2), 
the estimate of node i at the end of this round is 

= ls,{t) + ^Siit) + ^Zi{t) 



/I 1 \ 1 1 1 

.7=1 ^ ^ 



where Zi{t) is a WGN independent of (Xi,X2, . . . ,Xm). Thus Vi(t + 1) = (Vi(t) + VJ(t) + Zi(t))/2 is independent 
of the sources (Xi,X2, . . . ,Xjn), and the coefficient 7jj(t + 1) = (jijit) + jij(t))/2 > 0. By induction, (4) and 
lijit) > hold for all te[l:T]. Therefore, 

7n(* + 1) = \ni{t) + ^liiit) > ^liiit). 

This can be rewritten as 

, n ^ flTuW if ^ee^+l 
7n(i + 1) > < 

{lii{t) ifz^et+i, 

and we have (5). ■ 

Using this lemma, we estabhsh the following lower bound on the number of rounds T for any distributed 
weighted-sum protocol and any network. 

Lemma 3: Given < D < {m — l)/m^, if a distributed weighted-sum protocol {T,p{e), R,n,d) achieves 
distortion D, then 



2 ^\^/D + l/m, 

Proof: By Lemma 2, given any edge selection sequence e G Sf, the estimate of node i at round T is 

m 

Si{T) = J2lijiT)Sj{0) + Viin 

The distortion at node i is 



2 ^1 

> 



If (e, R{e),n) block code achieves distortion D{e), then a lower bound on the number of rounds can be found by 
solving the optimization problem 



numnuze - ^ a. j 

i=l 



i=l 

1 ™ / 1 1 \^ 

subject to -E(^--j ^^(e) 

j=i ^ ' 



Tj > for i G M, 
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where Tj is real- valued for i e M. Let yi = 1/2'^* < 1. The above optimization problem reduces to the convex 
optimization problem 



^ m 

minimize log yi 

i=l 

1 ™ / 1 \ ^ 

subject to — < D{e 

m ^-^ \ rn J 

1=1 ^ ^ 



< 1 for i G M., 

and the KKT conditions are necessary and sufficient for optimahty. The Lagrangian for this problem is 

for 1/ > and /ij > 0, z G M. Setting dL/dyi = 0, we have yi = yj for i ^ j. Thus, 

Ti = -logyi > log ( ^— i ) , 

and the minimum number of rounds is lower bounded by T = (1/2) Xll^i ?i > {m/2) Iog(l/(^D(e) + 1/m)). 
For any distributed weighted-sum protocol that achieves distortion D, there exists at least one {e, R{e),n) block 
code that achieves distortion D{e) < D. Thus, T > (m/2) log{l/{^/D + 1/m)). ■ 

We are now ready to establish the lower bound on R^g{D). 

Theorem 2: Given 0<D<(m — l)/m^, then 

^WS(^) > I flog rT,\, ) flog ■ 



2 V Vn + l/mJ V 4mD 

Proof: Given a distributed weighted-sum protocol {T,p{e), R,n, d). Fix an edge selection sequence e G Sf. 
Suppose that the edge selected at round tir is {i,j}. Then at the end of this round, the estimate for node j is 

SjiUr) = ';^Sj{tir — 1) -|- —Si(tir — 1) -|- —Zi(tir — 1), 

where Zi{tir — 1) is a WGN with average power E [Si{tiT- — 1)^) d/{l — d). By induction, we can show that 
the estimate of node I at time t > Ur has the form Si{t) = {l/2)Pi{t)Zi{tir - 1) + Si{t), where /3;(t) > 0, 
S/^i A(^) = I5 and Si{t) is independent of Zi{tiT — 1). Now we compute the average distortion at the end of 
round T 

1 1: E ((S - = ^ E (e ( (^Z<fe - 1)) ') + E ((S - S,(T)r) ) 



7 ^ m 

d 

> 



4m2(l - (i)22(T 

where the first inequahty follows by the Cauchy-Schwarz inequality, and the last inequality follows from Lemma 2. 
We can repeat the above arguments for the second term (l/m) Yl^i ^ (('S' ~ -^K^))^) ^^'^ we obtain 



m 

1=1 1 = 1 T= 



^ 4m2(l -d)22(^-i) - 4m' 
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Since at least one (e, -R(e), n) block code has distortion -D(e) < D, the normalized distortion is upper bounded 
hy d< 4mD. Thus, by Lemma 1, the average rate is lower bounded by 

-R = — log - > - log -y= -— log 



m ''ci-2V /D + l/myV 4mL> 
This completes the proof of the theorem. ■ 

Remark: The above lower bound and the cutset bound in Theorem 1 differ in order by roughly a factor of log m, 
since log + is on the order of logm for all D < (m — Given that a centralized protocol for 

the star network can come to within a factor of 2 of the cutset bound suggests that the log m factor is the penalty 
of using distributed versus centralized protocols. 

B. Bounds on i?Q^g(D) 

In this section, we establish bounds on ^gws(-^) gossip-based weighted-sum protocols (T, Q, R, n, d) defined 
in Section 11. Note that this result also establishes an upper bound on R*{D) and R^g{D) because R*{D) < 

Let S{t) = [Si{t) S2{t) . . . Sm{t)]'^ and rewrite the update equations (2) in a matrix form as 

S(i + 1) = A(t + l)S(t) + Z(t + l), (6) 
where (i) 74(t + 1) is an m x m random matrix such that 

A{t + 1) = Aj = / - 1(0. - - cPjf 

with probabihty {l/m)Qij, independent of t, where / is the identity matrix and 0j and 0^- are unit vectors along 
the i-th and j-th axes, and (ii) 

z{t + i) = ]^Zj{t)(t>i + ]^Zi{t)4>^, 

where Zi{t) and Zj{t) are WGN sources with average power E {Si{tf) d/{l - d) and E {Sj{tf) d/{l - d), 
respectively, defined in Section U. 

Recall the following properties of the matrix A{t) from [5]. 

1) E [A{t)'^ A{f)) = A, where A = E(yl(0)) and the expectation is taken over all with probabihty {l/m)Qij. 

2) A{t) is symmetric positive semidefinite. 

3) The largest eigenvalue of A is 1 and the corresponding eigenvector is 1 = [1 1 ... 1]-^. 

4) The stochastic matrix Q that minimizes the second largest eigenvalue of A is the solution to the optimization 
problem 

minimize A2 {A) (7) 



subject io A = ^ ^ — QijA. 



- . 1-3 
1=1 3=1 

Qij > for all i,j 
Qij = Oif{i,j}^S 

m 

Qij = 1 for all i. 

Let A2 be the second largest eigenvalue of the matrix A associated with the optimum matrix Q*, which is a 
function of the topology of the network. 
Referring to the linear dynamical system in (6), express S(T) as 

T 

S(T) = A{T, l)S(O) + MT, t + l)Z(t) = A{T, l)S(O) + V(T), (8) 

t=i 
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where V(r) = YlJ=i A{T, t + l)Z{t) and 



' A{t2)A{t2 - 1) . . . A{ti) ift2>ti 



J ift2<h. 

We will need the following lower bound on the number of rounds T to prove the lower bound. 

Lemma 4: Given a connected network, if a gossip-based weighted-sum protocol (T, Q,R,n, d) achieves distortion 
D, then 

r>^in^"^-^' 



2 V mD 

Proof: Assume that the matrix A has eigenvalues 

Ai = 1 > A2 > A3 > . . . > 

with corresponding orthonormal eigenvectors ai = (l/-v/m)l, a2, aa, . . . , a^^. We can express the estimate S(0) as 

S(0) = YZi •5^a^, where = af S(0) ~ M{Q, 1). 

Consider the sum of distortions over all nodes for an (e, R{e),n) block code at the end of round T, 

E (||s(r) - js(o)|H = E (||yi(r, i)s(o) - JS(o)f) + E (||v(r)||2) 
>E(E(p(r,i)s(o)- Js(o)ins(o))) 

> E (^E(A(r,l)S(0) - JS(0)|S(0))^E(^(T,1)S(0) - JS(0)|S(0))) 
= E(||E(^(r,l))S(0)-JS(0)in 

2^ 



1=2 

i=2 \ i=2 / 



where the second and last inequalities follow from Jensen's inequality. 
By the definition of the matrix A, 



m / m m ^ y ^ x\ 

i=l \ 1=1 j=l ^ ' / 

m m ^ 



i=l j=l 



Thus, ZT=2 A. = m - 2. 
To achieve distortion D{e), 

^ ^ In ((m — l)/mD{e)) ^ m — 1 ^ f m — 1 

- 2In((m- l)/(m- 2)) - 2 ^ \mD{e) 

Since at least one (e, i?(e),n) block code achieves distortion D{e) < D, 

^^m — 1^ /m — 1 
~ mD 



The following lemma is useful for calculating norm squared of vectors relating to A{t). 

Lemma 5: For any random vector Y, independent of A{t), we have 

(i) E {\\A{t)Yf) < X2iA) E (||Y - JYf ) + E (|| JYf ), and 

(ii) E i\\A{t)Y - JYf) < X2{A) E (||Y - JYf), 
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T 



where J = (l/m)ll 

The proof of the lemma is given in the Appendix. 

The following lemma gives an upper bound on the average distortion for a gossip-based weighted-sum protocol. 

Lemma 6: The average per-letter distortion of the gossip-based weighted-sum protocol (T, Q, R, n, d) is upper 
bounded by 

- E (||S(r) - JS(0)f ) < ^ ((1 + uf - 1) + - " (1 + uf + -—^ + (A2 + uf, 

where u = d/2m{l — d). 

Proof: Consider the sum of distortions 

E (||S(T) - JS{0)f) = E {\\A{T, l)S(O) - JS(0)f ) + E (||Z(T)f ) 

T 

= E (p(T, i)s(o) - JS(o)f ) + ^E (||^(r,t + i)w(t)in , 

t=l 

where the first term corresponds to the sum distortion for the infinite-rate gossip algorithm and the second term is 
contributed by quantization distortions. We first find an upper bound on the first term using Lemma 5. Consider 

E (||^(r, l)S(O) - JS(0)f ) = E (||A(T, l)S(O) - j^(r - 1, l)S(0)f ) 

< A2 E {\\A{T - 1, l)S(O) - JA{T - 1, l)S(O)lp) 
<A2E(||A(T-1,1)S(0)- JS(0)||2) 
<AjE(||S(0)-JS(0)f) = (m-l)Ai^. 

Next we consider 

E(P(T,t + l)Z(t)f) 

< A2E (P(r -l,t + l)Z(t) - JA{T - l,t + l)Z{t)f) + E {\\JAiT -l,t + l)Z{t)f) 
= A2 E {\\A{T -l,t + l)Z{t) - JA{T -~2,t + l)Z(t)f) + E {\\JZ{t)f) 

< Ai E (P(r -2,t + l)Z(t) - JA{T - 2, t + l)Z(t)f) + E (|| JZ(t)f ) 

< Xl-'E{\\Z{t) - JZit)f) + E{\\JZ{t)f) 

!!^A- + l)E(||Z(.)|n. 
By the definition of the vector Z{t), we have 

^('l^^^^'l') = 2ma-cZ) ^("^^^"^^"') 

= (E - 1, l)S(0)f ) -t- E (||V(t - l)f )) 

< « (A2 E {\\A{t - 2, l)S(O) - JA{t - 2, l)S(O)f) + E {\\JA{t - 2, l)S{0)f) + E {\\Y{t - l)f )) 

< u (A*-i E (||S(0) - JS(0)in + E (II JS(0)in + E (||V(i - l)in) 

<n(l + (m-l)Ar^+E(||V(t-l)f)). 
Combining above inequalities, we obtain 

E (iiv(t)in <iz4^+ + - 1)^^"' + ^ - • 

Suppose that for r = 1, 2, . . . , i 

E (||V(r)f ) < (1 + uY + (m - 1)(A2 + txf - 1 - (m - 1)A5, (9) 
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then 

t+i 



E (II v(i + i)f ) <j2u(^ + I^A^+O + ^)'"' + ("^ - + uy-') 

\ f I V lit I 

T=l ^ ' 



+ (A2 + ^)*+i-(A2)*+i 

= it^ h u\m — 1) 

ti u 

= (1 + + (m - 1)(A2 + - 1 - (m - 1)A*2+^ 
By induction, (9) holds for r = 1, 2, . . . , T - 1, thus 

E (II v(T)f) < E 4;!: + ((1 + + ("^ - + ^)*) 

= - ((1 + nf - 1) + " ((1 + uf - \l) 

m m 1 — A2 + u 

+ ^^-^ (1 - (A, + uf) + ((A, + - AD . 

ml — A2 — ^ m ^ ^ 

Therefore, we have the upper bound on distortion 

lE(||s(r)-js(o)f) 

< Ili^Ai^ + A ((1 + - 1) + "^-r^ — ((1 + - ^2 ) 

2 



+ "^-\ " (l-(A2 + .f) + f!^) ((A2 + .f-Ar) 
< ((1 + - 1) + (1 + + --r^ + (A2 + nf. 



We are now ready to establish the bounds on RQY^g{D). 

Theorem 3: Given a connected network with associated eigenvalue A2, then 

(i) if a gossip-based weighted-sum protocol achieves distortion D < 1/ 4m, then 

m— 1 /, m — 1\ /, 1 \ 

Raw^iD) > -r In — log - — - , and 

^^^^ ^ - 2m V mD ) \ ^ 4mD J ' 

(ii) there exists an m{D) and a gossip-based weighted-sum protocol such that for all m > m{D), 

«)^i(>4)(-^). 

where A2 = 1 — A2. 

Proof: 

(i) The distortion analysis is the same as Theorem 2. Thus, d/4m < D. Using Lemma 1 and 4, we have the 
following lower bound on the expected per-node transmission rate 

„ T , /1\ m - 1 /, m- 1\ / 1 
i?=-log - >— In — log 



m \d J 2m \ mD J \ AmD 

(ii) If the distortion D > {m ~ l)/m^, then zero rate is achievable for T = as discussed in Section V. 
Otherwise, we choose the optimal stochastic matrix Q* with eigenvalue A2 according to (7) for the given 
network topology. 
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For D < {m — l)/m'^, we need to show that 



lim J^E(||S(r)-JS(0)f) <1. 



By Lemma 6, it suffices to show that 



((1 + - 1) + 45T^(1 + + + ^(1 - A2 + uf) < 1. 

m^co\m^D^ ' mDX2 + u mD \2 — u D J 



We set 



A2 V-Dy in(2/i:>) 

First consider 

1 1 // m\ n X (1/A2)ln(2/D) N 

m->oo mD ^ ^ mL» \ V 2(1 - d) ln(2/D) / 



if lim mD = c 

C \ J m— >oo 

— if lim mD = 0, 

2 m— >oo 

(lA2)ln(2/D) 

< eV2, 



lim(l+nf = lim (l+ ^ "^^^^ , 
m->oo^ ^ m-^oo\ 2(1 - d) ln(2/£))y/ 

m^oomD\X2±uJ rn^oomD \X2±mX2D/2{l-d)\n{2/D)J 
lim _(l-A2 + nf = lim 1 - A2 + --— — < 



m— >oo 



i:>' ' m^ooD\ 2(1 - d)ln{2/D) J 2 ' 

Letting m ^ cxd, we obtain the following ratio of the expected distortion to the given distortion 

lim -t^ E (||S(r) - JS(0)f ) < ^ < 1. 
Therefore, by Lemma 1, the average distortion D is achievable for average rate 

1 1 (,4) (logM. 

m a mA2 V / \ m^A2-D / 

■ 

Note that the upper bound on i?Q^g(Z)) is not in general convex and therefore can be improved by time-sharing. 

Remark: The above upper bound can be improved for distortion decreasing slowly in m by choosing higher d. 
Specifically, by choosing T = (I/A2) ln(3/D) and d = 2m^A2-D/lnm, for distortion D = n{l/mlogm), and 
T = (1/A2)ln(3/Z)) and d = m^A2-D/4 for distortion D = o(l/mlogm) sad D = n{m~''), c> 0, we obtain the 
following tighter upper bounds 

R* J ™"^2 V \ 2m^X2Dj \mAogm. 

^GWSK^) - S 1 / a\ / 4 \ ( ^ " 

In — log if L> = o ) and L> = n (m"'^) . 

I, mA2 V / V m^X2D J \mlogmJ 

Note that these upper bounds differ from the upper bound given in Theorem 3 but have the same order. 

Since Gaussian sources are the hardest to compress, we can show that the above upper bound is also an upper 
bound for general, non-Gaussian sources. 

Corollary 1: The upper bound in Theorem 3 holds for general non-Gaussian i.i.d. sources (Xi,X2, . . . ,Xjn) 
each with average power one. 
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Proof: Assume that Gaussian test channels with independent additive WGN are used. Then the analysis of 
distortion is the same as in the proof of Theorem 3. In round t + 1, the rate of node i G ej+i is 

I{Si{t)- Sm = h{Si{t)) - h{Si{t)\Si{t)) 

= h{{l- d){Si{t) + Zi{t))) - h{{l - d)Zi{t)) 

= h{{l- d){Si{t) + Zi{t))) - ^log {27reE{Siitf) d{l - d)) 

< ^ log (27re E{Siitf)il - d)) - ^ log (27reE {Siitf) d{l - d)) 

1, 1 
= 2^°Sd' 

which is less than the rate for Gaussian sources. Thus, the upper bound on the expected network rate distortion 
function for WGN sources in Theorem 3 is also an upper bound for non-Gaussian i.i.d. sources with the same 
average powers. ■ 

Remarks: 

1) For a complete graph, it can be easily shown that A2 = l — l/(m — 1). Thus the upper and lower bounds of 
Theorem 3 gives 

m-1 2\ / (m- 1)111(2/1?) \ 

and 

, , m — 1 f m — 1\ I 1 
RhwsiD) > -7: In — log — — - 

These two bounds differ by a factor of log log m for distortion D = Q{l/ni\ogm) and a constant factor 
elsewhere. It is likely that the factor of log log m is the result of looseness in our lower bound. On the other 
hand, the lower bound of Theorem 2 gives 



which is also a lower bound on i?QYvs(^)- above two lower bounds only differ by a constant factor for 

D = Q{m~'^) and c > and by a factor of log(l/I?)/ log rn for D = o{m~'^) and c > 0. 
2) For the star network considered in Subsection V-C, A2 = l — l/2(m — 1) and the upper bound gives 

W)<^(.nl)(log.^('"-'"■'P/'"^ 



m \ D J \ m?D 
On the other hand, the upper bound in Subsection V-C gives 

^•(°)^^'-(^ 

These two bounds differ by a factor of (log log m) log m for distortion D = Q.{\./m\ogm) and logm for 
D = o(l/mlogm), D = Q,{m~'^), and c > 0. The logm factor represents the penalty of using the gossip- 
based distributed protocols. 



C. Network Rate-Distortion Function with Side Information 

In Subsections VI-A and VI-B, we did not consider the correlation (side information) between the node estimates 
in computing the transmission rate. While Theorem 3 remains an upper bound when side information is considered, 
the lower bounds in Theorems 2 and 3 do not necessarily hold. To explore the effect of side information on rate, we 
consider the gossip-based weighted-sum protocol {T,Q,R,n,d) with Wyner-Ziv coding. Suppose that edge {i,j} 
is selected in round t -I- 1. By the Wyner-Ziv theorem, the transmission rate in each round can be reduced from 
(l/2)log(l/ci) to 

^ = 2 [ d ) ' 
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Simulation 

Simulation (Wyner-Ziv) 
Upper bound (Theorem 3) 




0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 

D 

Fig. 1. Comparison of estimated expected network rate-distortion function ^?gws(^) with and without Wyner-Ziv coding to the upper 
bound in Theorem 3 for m = 50. 

Figure 1 compares the simulated i?Q^g(D) with and without Wyner-Ziv coding to the upper bound in Theorem 3 
for a complete graph with m = 50. Note that the improvement in rate drops from 35% at distortion D < 0.0004 
to around 15% for high distortion 0.004 < L> < 0.02. 

VII. Conclusion 

We introduced a lossy source coding formulation for the distributed averaging problem. We established R*{D) 
for the 2-node network with correlated WGN sources and a general cutset lower bound for independent WGN 
sources. The cutset bound is achieved within a factor of 2 using a centralized protocol in a star network. We then 
established a lower bound on the network rate distortion function for the class of distributed weighted-sum protocols 
and bounds on the network rate distortion function for gossip-based weighted-sum protocols. The bounds differ by 
a factor of only log log m for a complete graph network. The results provide insights into the fundamental hmits 
on distributed averaging and on the penalty of using a distributed protocol. 

There many questions that would be interesting to explore. For example: (i) We showed that the cutset bound 
can be improved for tree networks. Can it be improved in general, or even for simple networks with loops such as 
a ring? (ii) Is the log log m factor in the upper bound for the gossip-based weighted-sum protocols necessary? Can 
the lower bound be tightened? (iii) We have investigated distributed weighted-sum protocols with a time-invariant 
normalized local distortion d. Can the order of the rate be reduced by letting d vary with time? (iv) The distributed 
weighted-sum protocols as defined in the paper do not take advantage of the build-up of correlation in the network. 
Using Wyner-Ziv coding can indeed reduce the rate as demonstrated in Subsection VI-C. It would be interesting 
to find bounds with side information. 
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Appendix 
Proof of Lemmas and Propositions 

Proof of the converse of Propersition 1: In each time slot, the index sent by node 1 is a function of the source 
Xj\ and the indices sent by node 2 in past time slots. Let the Wi and W2 be the collections of indices sent by 
nodes 1 and 2 over all time slots, respectively, and consider 

nRi > H{Wi) > H{Wi\X^i) 
= IiX{\-W^\X^,) 

n 

= 5] (^h{Xik\X^^\X^,) - h{X^k\X'[r\X^„Wi)) 

k=l 

n 

Wi)) 

k=l 

= ^ log (^27re (^Pi - - ^ h{Xik\X2k, U,k), (10) 

where Uik = {Wi,Xt\X2l+,). 

To bound the second term, we consider the distortion between the estimate 521 of node 2 and the weighted-sum 
g'". The estimate is a function of {Wi,X2i) = (C^ijfc, -^2ifc)> and the distortion is equal to 



-. n 1 " 

-y2E{{g{X,k,X2k)-92k{Uik,X2k)f) > -y2E{YaT{g{Xik,X2k)\U,k,X2k)) 

k=l k=l 

1 " 

> - (Var(aiXifc + as^sfclC/ifc, Xsfe))) 

^ k=l 

= ^ E «i ^ (Var(Xife|i7ife, X2k))) • 
k=l 

Let the average distortion between g2i and g'^ be D2. We can bound the second term in (10) as 

n "^1 

J2HXik\Uik,X2k) < J]-log(27reE(Var(Xifc|C/ife,X2fe))) 

k=l 

< ^ log (^27re f^^^E (Var(Xife|i7ife, ^2;^))^ 



2 

fe=l k=l 



"1 D2 

< — log 27re— ^ 



2 

The transmission rate of node 1 is lower bounded by 



2 °V ^2 

Similarly, let the average distortion between 5"^ and g'^ be -Di, then 

^^2 > T^log ' ' ^ 



2 ° V i:> 

The per-node transmission rate is 



^ = o (-^1 + -^^2) > o lor ' 



2 ' ' - 2 V VWD: 
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The network rate distortion function is lower bounded by the per-node transmission rate minimized over all 
distortions Di and D2 satisfying {Di + D2) /2 < D. The minimum is achieved for D\ = D2 = D, and thus 
the lower bound is 

fi-(0)>ilog( °-°-"-^''^ ). 

■ 

Proof of Lemma 5: 

(i) By the assumption that Y is independent of A{t), we have 

E {\\A{t)Yf) = E [Y^ A{t)'^ A{t)Y) 

= E(E(Y^A(t)^^(t)Y| Y)) 
= E(Y^E(A(tf Y) 
= E (Y^AY) 

= E ((Y - JY + JY)^A{Y -JY + JY)) 
= E ((Y - JY)^A(Y - JY)) + E (Y^ J^AJY) 
< X2{A) E ((Y - JY)'^(Y - JY)) + E (Y^J^JY) 
= \2{A) E (||Y - JYf ) + E (II JYf) . 

(ii) We consider the norm squared of A{t)Y — JY 

E {\\A{t)Y - JYf) = E {{A{t)Y - JYf {A{t)Y - JY)) 
= E ((Y - JY)'^ A{tf A{t){Y - JY)) 
= E(P(t)(Y-JY)f). 

By (i) and J(Y - JY) = 0, we have 

E {\\A{f){Y - JY)f) < X2{A) E (||Y - JYf) . 

This completes the proof. 
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